Method And System For Acquiring Data From Machine-Readable Documents

ABSTRACT

In a method for acquiring data from a machine-readable document for assignment to fields of a database, individual data are extracted substantially automatically from the document and entered into the corresponding database fields. If data cannot be extracted from the document with a desired degree of reliability for one or more particular database fields, then the steps are executed of displaying the document onto the display screen, displaying on the display screen the at least one or more database fields for which the data cannot be extracted with the desired degree of reliability, and executing a proposal routine with which string sections in the vicinity of a pointer movable by a user on the display screen are selected, marked, and proposed for extraction.

BACKGROUND

The preferred embodiment invention relates to a method and a system foracquiring data from machine-readable documents, the data being assignedto a database, in which individual data are extracted from the documentas automatically as possible and are entered into corresponding databasefields, the method and system according to the present inventionrelating to the acquisition of data in the case in which data cannot beextracted with the necessary degree of reliability for one or moreparticular database fields of a document.

Methods and systems for acquiring data from machine-readable documentsare known. In the standard situation, the systems have a scanner withwhich documents are optically scanned. The data files produced in thisway are machine-readable documents, and as a rule contain text elements.The text elements are converted into coded text with the aid of an OCRdevice. As a rule, predetermined forms or templates are assigned to thedata files, so that on the basis of the forms data files containingparticular items of information from the text can be determined in atargeted manner. These items of information are stored for example in adatabase.

Methods and systems of this sort are used for example in large firms inorder to read invoices. The data extracted in this way can becommunicated automatically to an accounting software program.

Such a system is described in U.S. Pat. No. 4,933,979. This system has ascanner for the optical scanning of forms. In this system, a largenumber of types of forms can be defined, each type of form or templatebeing defined by a plurality of parameters, in particular geometricallydefined areas in which texts or images are to be contained. The formtypes can also be defined by additional characteristics, such as forexample the type of script contained in the texts (letters, numbers,symbols, katakana, kanji, handwriting). After a form has been scanned, atemplate is assigned to the scanned form using a form typedistinguishing device. Correspondingly, the data contained in the textfield are read and extracted using an OCR device. If no suitabletemplate exists, it is necessary to create one.

From WO 98/47098, another system is known for the automatic acquisitionof data from machine-readable documents. Here, a scanner is used tooptically scan forms. Subsequently, a line map of the form is createdautomatically. Here, on the one hand all lines are acquired, and allgraphic elements are converted into a line structure. Other elements,such as for example text sections, are filtered out. All vertical linesform the basis for creating a vertical key, and all horizontal linesform the basis for creating a horizontal key. Subsequently, it isdetermined whether a template already exists having a correspondingvertical and horizontal key. If this is the case, the data are read outusing a corresponding template. If this is not the case, then on thebasis of the scanned-in form a template is created and stored using aself-learning mode.

In the book Modern Information Retrieval by Baeza-Yates andRibeiro-Neto, Addison-Wesley Press, ISBN 0-201-39829-X, the basicprinciples of databases and information stored for rapid finding indatabases are explained. Thus, in Chapter 8.2, a method using inverteddata files, also designated an inverted index, is described. In thismethod, from a text that is to be examined first a dictionary is createdhaving all the words contained in the text. Each word in the dictionaryis assigned one or more numbers that indicate the location at which theword occurs in the text. Such inverted data files enable a more rapidautomatic analysis of a text that is to be searched. In Chapter 8.6.1, astring matching method is described in which two strings are comparedand a cost measure is calculated that is indirectly proportional to thesimilarity of the strings. If the two strings are identical, themagnitude of the cost measure is zero. The more the strings differ, thegreater is the magnitude of the cost measure. The cost measure is thusan expression of the similarity of the two strings. This and similarmethods are also known under the names approximate string matching,Levenshtein method, elastic matching, and Viterbi algorithm. Thesemethods are part of the field of dynamic programming.

In the not-yet-published patent application DE 103 42 594.2, a methodand a system for acquiring data from a plurality of machine-readabledocuments are described in which, from a document that is to beprocessed—the read document—data are extracted by reading them out atpositions in the read document that are determined by fields entered ina master document.

If an error occurs during the reading out of the read documents, theread document is displayed on a display screen and the data can be readout only by marking corresponding fields in the read document. Here, ifit is required, additional master documents are automatically producedon the basis of the marked read documents, or existing master documentsare correspondingly corrected. This system is easy enough to use that nospecial computer or software knowledge is necessary.

A method that supports an operator in the generation of electronictemplates for a form recognition system arises from U.S. Pat. No.5,317,646. For this, a form not provided with data (what is known as amaster form) is shown on a screen, and the user can identify the datafields with a pointer device. The coordinates that bound thecorresponding region are automatically detected after which a singlepoint within this region has been selected by the operator. Templatesfor the automatic form recognition can be created simply and quicklywith this method.

In Casey R. G. et al., “Intelligent Forms Processing”, IBM SystemsJournal volume. 29 (1990) No. 3, pages 435 through 450, a formrecognition method is described in which a scanned-in form is analyzedby means of image processing techniques and is compared with otherstored template forms. In the event that no correlation with a templateform is found, a new template form must be generated via input on acomputer. In the generation of a template, the scanned form is shown onthe screen and the boundary lines of the input fields are marked with apointer device.

A two-stage method in which form templates can be initially input anddocuments can be automatically read out using the input form templatesarises from US 2002/141660 A1. Form templates to be input are scanned,and the operator indicates input fields with a cursor. The position andsize of the input fields is stored. The operator can also determine thedata type associated with each data field. Given automatic reading offorms, these are scanned in and automatically read out using the datafields contained in the stored form documents. In the event that anerror occurs in the readout, the operator can correct the errors via thekeyboard.

U.S. Pat. No. 6,028,970 concerns a method and a system for automatictext recognition (OCR). The system comprises an error correction module(“error correction logic module”). This error correction module isapplied to clearly detectable data errors in order to correct these.These corrections are executed automatically. Not only errors ofindividual letters are hereby detected, but rather errors in context areanalyzed and correspondingly corrected. An error that cannot beautomatically corrected can be communicated to the operator by means ofan error message. The operator can then assess and, if applicable,correct the text generated by means of the text recognition.

It is an object to create a method and a system for acquiring data frommachine-readable documents in which the inputting of the data issignificantly simplified in comparison with the known methods in casesin which data cannot be automatically extracted.

In a method for acquiring data from a machine-readable document forassignment to fields of a database, individual data are extractedsubstantially automatically from the document and entered into thecorresponding database fields. If data cannot be extracted from thedocument with a desired degree of reliability for one or more particulardatabase fields, then the steps are executed of displaying the documentonto the display screen, displaying on the display screen the at leastone or more database fields for which the data cannot be extracted withthe desired degree of reliability, and executing a proposal routine withwhich string sections in the vicinity of a pointer movable by a user onthe display screen are selected, marked, and proposed for extraction.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a method for acquiring from a document data that cannot beextracted automatically;

FIGS. 2-6 each show copies of display screen representationscorresponding to individual method steps of the method indicated in FIG.1;

FIG. 7 shows a method for extracting data arranged in tables;

FIGS. 8, 9 each show a table with marked data; and

FIG. 10 shows a system for executing the method according to thepreferred embodiment.

DESCRIPTION OF THE PREFERRED EMBODIMENT

For the purposes of promoting an understanding of the principles of theinvention, reference will now be made to the preferred embodimentillustrated in the drawings and specific language will be used todescribe the same. It will nevertheless be understood that no limitationof the scope of the invention is thereby intended, such alterations andfurther modifications in the illustrated device, and/or method, and suchfurther applications of the principles of the invention as illustratedtherein being contemplated as would normally occur now or in the futureto one skilled in the art to which the invention relates.

With the methods explained above, data can be acquired from a pluralityof machine-readable documents, the data being assigned to a database inthat individual data are extracted from the document as automatically aspossible and are entered into corresponding database fields. If datacannot be extracted with the necessary degree of reliability for one ormore particular database fields of a document, for example because anerror has been determined, caused for example by the fact that no dataor false data are present in the document at the point at which the dataare to be read, or that during the reading in of this document using anOCR method one or more characters are falsely converted, then accordingto the preferred embodiment the following steps are executed:

-   -   displaying of the document on a display screen,    -   indication on the display screen of the database field for which        the data cannot be extracted with the necessary degree of        reliability,    -   execution of a proposal routine with which string sections in        the vicinity of a pointer on the display screen that can be        moved by a user are selected, marked, and proposed for        extraction.

The document is displayed on the display screen so that the user canread it. In addition, the database field is indicated for which the datacannot be extracted with the necessary degree of reliability. In thisway, the user is informed of the database field for which the data muststill be extracted from the document shown on the display screen.

Through the execution or activation of the proposal routine, stringsections in the vicinity of a pointer, movable on the display screen bythe user, can be selected, marked, and proposed for extraction. In thisway, the user need merely move the pointer on the document shown on thedisplay screen into the vicinity of a string section that contains thedata for the indicated database field. The data are then automaticallyselected, marked, and proposed for extraction. The user can thentransfer or incorporate the proposed string section into the databasefield merely by actuating a particular key.

Through the automatic selecting and marking of the string section, theprocess of incorporating the still-missing data is significantlysimplified and accelerated.

According to a preferred specific embodiment of the present invention,during the selection of the string section concept or design informationis taken into account that is assigned to the respective database field.

The method according to the preferred embodiment for acquiring data frommachine-readable documents is a development of the methods describedabove with which data can be extracted from documents and stored in adatabase by machine.

However, in these methods it is not always possible to fill all databasefields of the database reliably with data extracted from the documents.If, for example, there is an error during the extraction of the data,the automatic method is interrupted and, with the cooperation of theuser, the data from the document are manually entered into databasefields. Such an error can result from the fact that in the document tobe processed no suitable string section is found from which the data canbe read, or the string section contains erroneous data that arise forexample during the conversion of the document into coded text using anOCR method.

The method according to the preferred embodiment thus begins when datacannot be reliably extracted. The expression “not reliably extractable”includes both fundamental errors in the reading of data that make areading of the data impossible, and also read data that are mapped tothe database field while taking into account context information, thequality of the mapping being determined during this process. Suchmapping methods include for example the string matching method namedabove. If the mapping quality achieved here is too low, theautomatically read-in data are evaluated as insufficiently reliable andare rejected.

In the following, the method according to the preferred embodiment isexplained on the basis of the flow diagram shown in FIG. 1. In the flowdiagram, all steps that are executed automatically are identified withan “a” in a circle, and all steps that are to be carried out manually bythe user are identified with a “m” in a circle.

The method begins with step S1.

When data for at least one database field cannot be extracted with thenecessary degree of reliability, the corresponding document 1 isdisplayed on a display screen 2, and the database field 3 is indicated(step S2). FIG. 2 shows a display screen representation immediatelyafter the determination that data could not be extracted with thenecessary degree of reliability; here the document 1 is shown in awindow 4/1 on the right side of the display screen representation. Twowindows 4/2 and 4/3 are situated on the left side. Window 4/2 containsan overview of the documents that are to be processed, and in window 4/3the individual database fields are indicated in which data are storedthat are to be read from document 1.

In the example shown, none of the database fields could be filled withdata, for which reason the individual database fields 3 are providedwith the designation “empty”. However, it is also possible for data tobe missing only in a few database fields, or only in a single databasefield.

In FIG. 2, the database field “InvoiceNumber” is marked darker incomparison to the other database fields 3, which indicates to the userthat data are to be extracted from document 1 for this database field 3.In addition, in the upper area of window 4/1 the term “InvoiceNumber” isindicated in a larger font, additionally indicating to the user thedatabase field for which data are to be extracted.

In window 4/1, the user can now position a pointer 5 that he preferablysituates in such a way that it is located as close as possible to thestring section for which the user assumes that the content is to bestored in the corresponding database field. In the example shown in FIG.2, data are to be extracted for the database field “InvoiceNumber,” sopointer 5 is positioned in the vicinity of invoice number “4361” (stepS3).

Here, pointer 5 can be moved in window 4/1 using a mouse 6 or via inputson a keyboard 7.

After the positioning of pointer 5, a proposal routine begins thatcomprises a plurality of method steps. This proposal routine can on theone hand be initiated in that pointer 5 is not moved for a predeterminedtime interval, whereupon the proposal routine is then automaticallyexecuted, or it can be initiated by actuating a particular mouse buttonor keyboard key.

In step S4, it is first checked whether there is located in theimmediate vicinity of the pointer a string section having a conceptsuitable for database field 3, insofar as concept information has beenpreviously assigned to the corresponding type of the database field.This concept information includes the syntax and/or the semantics of thedatabase field. Information concerning syntax includes for example thenumber of numerals and/or letters and/or specified formats of the stringsection that is to be read. Thus, date fields, amount fields, andaddress fields have as a rule particular formats. Semantic informationincludes specified terms that can be entered into the correspondingdatabase field. This is useful for example for currency indications, orif the article designation of a particular supplier that can supply alimited number of articles is to be read in. The corresponding articledesignations are then stored in a lexicon and can then be unambiguouslyrecognized.

In the exemplary embodiment shown in FIG. 2, the two string sections“4361” and “02.08.2002” are situated in the vicinity of pointer 5. Thelatter string section has the syntax of a date, and for this reason itis rejected for the extraction of the invoice number. The string section“4361” corresponds to the syntax of an invoice number. Therefore, instep S4 it is decided that a string section having a suitable concept ispresent, and for this reason the method sequence next goes to step S5.In step S5, the string section “4361” is marked (FIG. 3). In the presentexemplary embodiment, the marking takes place through a coloredhighlighting or background of the string section and through the drawingin of a frame 8.

If in step S4 no suitable concept is determined, the method sequencegoes to step S6. In step S6, the individual character situated closestto pointer 5 is determined, which, in the present exemplary embodimentaccording to FIGS. 2-4, is the “1.” Subsequently, the boundaries of thestring section containing this character are determined according togeneral rules. These boundaries can for example be determined by emptycharacters or empty spaces in the document 1, or by particularpunctuation marks or other markings in document 1. If correspondingboundary markings are recognized, the string section situated betweenthem is selected and marked. In the exemplary embodiment shown in FIGS.2 and 3, on each side of string section “4361” there are situated emptyspaces, via which an unambiguous selection of the marking of the stringsection is possible, according to the general rules as well.

Independent of whether the string section has been selected or markedaccording to step S5 or according to step S6, the method sequence goesto method step S7, with which the string section is displayed in anadditional frame 9 as a coded text, and is displayed in an enlargedfashion in another frame 10 (FIGS. 3, 4). In the present exemplaryembodiment, document 1 is present as a graphic data file, e.g. in the.pdf, tif, .gif, or .jpg format. Normally, in the preceding methodsegment the document was subjected to an OCR routine and converted intocoded text. The coded text is here also examined for concepts, and thecorresponding information is stored. The section corresponding to thestring section is removed from this coded text and is shown in frame 9.In this way, the user recognizes whether the string section has beencorrectly converted into coded text.

In frame 10, the string section is shown in a graphic format in anenlarged representation, so that the user can also recognize details inthe string section.

In step S7, the proposal routine is terminated.

In step S8, the user judges whether the selected and marked stringsection is fundamentally suitable for transferring into the databasefield. If this is not the case, pointer 5 is repositioned (S3) and theproposal routine (S4-S7) is executed again. If, in contrast, theselection of the string section is fundamentally suitable, the userjudges whether the marked area is also correct (step S9). If this is notthe case, the user can manually process the marking of the stringsection and/or can edit the coded text in frame 9 (step S10). With theediting of the coded text, errors resulting from an incorrect OCRconversion can be removed. When these corrections (adapt area, edit) aremade, the marked area and the contents of frames 9 and 10 areautomatically adapted.

If the marked area is correct or has been correspondingly revised by theuser, the method sequence moves to step S11, in which the data containedin the selected string section are transferred into the correspondingdatabase field (FIG. 4). This transferring of the data is initiated byuser actuation of a predetermined mouse button or key on the keyboard.Subsequently, the method for extracting data for a database field isterminated (S12). If data are to be read for additional database fields,the method begins again with step S1. In FIG. 5, the next database fieldto be read (“Invoice Date”) is indicated.

With the method according to the preferred embodiment, the activity of auser in the manual transferring of data from a document into a databasefield is limited to the positioning of the pointer, the checking of theautomatically proposed selection and the possible correction of thearea, and the actuation of a key in order to transfer the data. Theselection and the marking of the area of the string section to beselected are carried out automatically by the method according to thepreferred embodiment.

FIGS. 2 to 5 show the transfer of data into an individual databasefield. However, by taking into account concept information, it is alsopossible to extract data for a plurality of database fields with asingle string section. FIG. 6 shows a corresponding exemplaryembodiment, in which the complete address is marked and read as a stringsection, the address being automatically segmented into the individualdatabase fields name, company, street, postal code, and city.

In the following, another construction of the method described above,with which data can be extracted from tables, is explained on the basisof the flow diagram from FIG. 7 and the display screen representationsaccording to FIGS. 8 and 9.

This method begins with step S15.

In step S16, the values of the table in the first table row areextracted according to the above method through the positioning of thepointer, the automatic selection and marking of the string section, andthe transferring of the data into corresponding database fields. FIG. 8shows a table in which the string sections of the first table row aremarked that have been transferred into the corresponding databasefields. These database fields have the structure of a table; forexample, they are applied as a two-dimensional data field, so thatduring the extraction of the data into these database fields the methodrecognizes automatically, on the basis of the database field, that dataare being read out from a table.

A row of a table can also extend over a plurality of pages if the tablecorrespondingly extends over a plurality of pages. If the data of thefirst table row has been completely extracted, the user can initiate theautomatic extraction of the further table entries using a predeterminedinput. If this input is actuated by the user, then, in step S17, first alist is created of all string sections that are situated under the firsttable row.

In step S18, a cost function is used to determine a cost value betweensequences of string sections of the list and the sequence of the stringsections of the first table row, on the basis of which data wereextracted into the database fields in step S16. In this cost function,low costs are assigned to the sequences of the string sections of thelist whose string sections agree with, or are at least very similar to,the corresponding string sections of the first table row, with respectto their horizontal position and their width. This cost value is thusindirectly proportional to the degree of similarity between thesequences of string sections appearing in the list and the sequence ofstring sections contained in the first table row.

The cost function used here corresponds to the cost function describedin Chapter 8.6.1 of String Matching Allowing Errors in ModernInformation Retrieval (ISBN 0-201-39829-X), with which an individualcost value between a string section of the first table row and a stringsection of the further table rows is determined. Because each sequencecomprises a plurality of string sections, the Viterbi algorithm is usedto calculate an overall cost value or overall similarity value for eachof the individual sequences of string sections, through summation of theindividual cost values.

On the basis of these cost values or similarity values, the sequences ofstring sections are determined as table rows whose similarity value liesbeneath a predetermined threshold value (S19). In this way, all tablerows, and thus table entries, of the table are determined. They aremarked in step S20 (FIG. 9) and in step S21 they are extracted, i.e.,automatically read out, converted into coded text if necessary, andstored in the corresponding database fields.

In step S22, this method is terminated.

Usefully, it is possible to post-process the table entries, i.e., tomodify (move, enlarge, make smaller) the marked areas, or to remove oradd individual rows. In the case of a post-processing, the entries inthe database fields are automatically updated correspondingly.

In addition, during reading out of the data and entering into thedatabase fields an additional check can take place through a mappingusing the string matching method, with which it is determined how wellthe entries agree with the concept specified by the individual databasefields.

In addition, the method according to the preferred embodiment can becombined with the method described in German patent application DE 10342 594.2 for acquiring data from a plurality of machine-readabledocuments, for which reason reference is made to the complete content ofthis patent application, and it is incorporated into the present patentapplication by reference.

In this method for the automatic acquisition of data from a plurality ofmachine-readable documents, master documents are compared with a readdocument and their similarity is evaluated. The method applied here canalso be used for reading out from a table, the sequence of the selectedstring sections corresponding to the first table row of the masterdocument, and the combinations of string sections corresponding to thefurther table rows of the read documents.

In the above-described method according to the preferred embodiment forextracting data from tables, a user need merely move the pointer to thetable entries in the first table row and confirm the transferring of thethen automatically selected and marked string sections as data for thecorresponding database field. After the user has done this for all tableentries of the first table row, he need merely initiate the completereading out of the further table entries by making an input. The methodthen automatically determines the further table entries, marks them, andextracts the data into the database.

This significantly accelerates the reading out of data from the tableinto a database. Method segment S17 to S21 therefore represents anindependent preferred embodiment in its own right, which is howeverpreferably applied in combination with the method represented in FIG. 1,to which step S16 relates.

FIG. 10 schematically shows a system for executing the method accordingto the preferred embodiment. This system 11 comprises a computer 12having a storage device 13, having a central processor device (CPU) 14,and having an interface device 15. A scanner 16, a display screen 2, andan input device 17 are connected to computer 12. Input device 17includes a keyboard 7 and/or a mouse 6.

In storage device 13, a software product is stored for executing themethod according to the preferred embodiment, this software productbeing executed at CPU 14. Scanner 16 is used to acquire documents and toconvert them into an electronic data file. These electronic data filesare read by computer 12 and are preprocessed if necessary, using an OCRroutine and/or a method for recognizing particular syntax or semanticsin the data file. Subsequently, the documents contained in the datafiles are processed in a manner corresponding to the method describedabove, using system 11. At input device 17, the corresponding inputs canbe carried out, these being limited to movements of pointer 5 and a fewkeyboard inputs. If necessary, the marked fields can be moved using thekeyboard or the mouse, or can be adapted by enlargement or by being madesmaller, or the coded text can be edited.

The present invention has been explained above on the basis of anexemplary preferred embodiment. Modifications thereof are possiblewithin the scope of the present invention. Thus, for example, instead offrame 8 it is possible to provide only frame 10, in which the selectedstring section is shown in an enlarged manner. This frame 10 alsorepresents a marking of the string section.

In the above-explained exemplary embodiment, the documents are scannedin and are then present in a graphic format. However, the methodaccording to the preferred embodiment can also be used for readinginformation from documents that are already present in coded text, suchas for example e-mails. Of course, given such an application it is notnecessary for the documents to be converted into coded text using an OCRroutine.

Consequently, the preferred embodiment can be briefly summarized asfollows:

The preferred embodiment relates to a method for acquiring data frommachine-readable documents, the data being assigned to a database.

With the preferred embodiment, string sections located in the vicinityof a pointer that can be moved by the user are automatically selectedand marked, and their content is proposed for transfer into a database.

According to a development of the method according to the preferredembodiment, the content of a table can be read out in a fully automaticmanner if the table entries in a first table row have already been readout according to the above method.

An exemplary preferred embodiment in various forms of the presentinvention have been described. Here it is clear that someone skilled inthe art can at any time indicate modifications and developments thatmake use of the concept of the preferred embodiment. In addition, thepreferred embodiment can be realized both by means of electroniccomponents (hardware) and through computer program elements (software orsoftware modules). In particular, the preferred embodiment is realizedhere as a combination of electronic hardware elements and softwareelements. Correspondingly, the preferred embodiment also includescomputer program products, such as for example electronic data carriers(CDs, DVDs, diskettes, tape drives), or components that are distributedvia computer networks (Internet) and/or on computers, and in particularare loaded into intermediate storage units and are kept ready thereand/or are run from there.

While a preferred embodiment has been illustrated and described indetail in the drawings and foregoing description, the same is to beconsidered as illustrative and not restrictive in character, it beingunderstood that only the preferred embodiment has been shown anddescribed and that all changes and modifications that come within thespirit of the invention both now or in the future are desired to beprotected.

1-12. (canceled)
 13. A method for acquiring data from a machine-readabledocument for assignment to fields of a database, comprising the stepsof: extracting individual data substantially automatically from thedocument and entering the extracted data into the corresponding databasefields; and if data cannot be extracted from the document with a desireddegree of reliability from one or more particular database fields,executing the steps of displaying the document on a display screen,displaying on the display screen the one or more database fields forwhich the data cannot be extracted with said desired degree ofreliability, and executing a proposal routine with which string sectionsin a vicinity of a pointer movable by a user on the display screen areselected, marked, and proposed for extraction.
 14. A method according toclaim 13 wherein the string section is selected, marked, and proposedfor extraction in accordance with concept information assigned to thedatabase field.
 15. A method according to claim 14 wherein the conceptinformation describes a syntax or semantics of the database field, sothat the proposal routine selects and marks a string section that is tobe marked in a manner corresponding to the syntax or to the semantics ofthe respective database field.
 16. A method according to claim 15wherein the information concerning syntax describes a number of numeralsor letters or predetermined formats of the string section that is to beread.
 17. A method according to claim 15 wherein the informationconcerning semantics describes specified terms.
 18. A method accordingto claim 13 wherein a string section is selected that is situatedbetween two limiting characters.
 19. A method according to claim 18wherein the limiting characters include empty characters or punctuationmarks.
 20. A method according to claim 13 wherein text of the documentin graphic representation is first converted into coded text using anOCR method, and the proposal routine represents, in addition to themarked string section in graphic representation, coded text of saidstring section.
 21. A method according to claim 13 wherein in additionto the marked string section, said string section is displayed again onthe display screen in an enlarged representation.
 22. A method accordingto claim 13 wherein after the marking of the string section, theproposal routine activates a function with which a content of the markedstring section is transferred into the database through the actuation ofone or more predetermined keys.
 23. A method according to claim 13wherein during the execution of the proposal routine, after movement ofthe pointer a predetermined time wait interval is observed, during whichthe pointer must not be moved, before a string section is selected. 24.A method according to claim 13 wherein a table is displayed as saiddocument on the screen, and after data have been read from a first rowof the table into corresponding database fields, further table entriesof further table rows are automatically determined through a comparisonof string sections situated under the first table row with the stringsections of the first table row.
 25. A method for acquiring data from amachine-readable document for assignment to at least one field of adatabase, comprising the steps of: extracting individual data from thedocument and entering the extracted data into at least one correspondingdatabase field; and if data cannot be extracted from the document with adesired degree of reliability from at least one of the database fields,executing the steps of displaying the document on a display screen,displaying on the display screen the database field for which the datacannot be extracted with said desired degree of reliability, andexecuting a proposal routine with which string sections in a vicinity ofa pointer movable by a user on the display screen are selected, marked,and proposed for extraction.